6 research outputs found

    Extending the VEF traces framework to model data center network workloads

    Get PDF
    Producción CientíficaData centers are a fundamental infrastructure in the Big-Data era, where applications and services demand a high amount of data and minimum response times. The interconnection network is an essential subsystem in the data center, as it must guarantee high communication bandwidth and low latency to the communication operations of applications, otherwise becoming the system bottleneck. Simulation is widely used to model the network functionality and to evaluate its performance under specific workloads. Apart from the network modeling, it is essential to characterize the end-nodes communication pattern, which will help identify bottlenecks and flaws in the network architecture. In previous works, we proposed the VEF traces framework: a set of tools to capture communication traffic of MPI-based applications and generate traffic traces used to feed network simulator tools. In this paper, we extend the VEF traces framework with new communication workloads such as deep-learning training applications and online data-intensive workloads.Ministerio de Ciencia e Innovación y Agencia Estatal de Investigación (MCIN/AEI/10.13039/501100011033) R &D Project Grant (PID2019-109001RA-I00)Publicación en abierto financiada por el Consorcio de Bibliotecas Universitarias de Castilla y León (BUCLE), con cargo al Programa Operativo 2014ES16RFOP009 FEDER 2014-2020 DE CASTILLA Y LEÓN, Actuación:20007-CL - Apoyo Consorcio BUCL

    Control de Congestión Eficiente para Redes HPC con Encaminamiento Adaptativo

    Get PDF
    La red de interconexión es el elemento principal en los clusters de computación de alto rendimiento (HPC) y centros de datos (DC), donde miles de nodos deben comunicarse de forma rápida y fiable. El rendimiento de la red depende de varias opciones de diseño, como la topología, el algoritmo de encaminamiento, la arquitectura del switch, etc. En la literatura se han propuesto algoritmos de encaminamiento altamente eficientes, ya sean deterministas o adaptativos, para equilibrar de forma inteligente los flujos de tráfico dependiendo de la topología de red, pero su rendimiento se reduce en los escenarios en los que la congestión y sus efectos negativos (por ejemplo, el HoL blocking) aparecen. En particular, en escenarios donde la congestión es intensa y persistente, el HoL blocking puede degradar drásticamente el rendimiento de los algoritmos de encaminamiento adaptativo, ya que pueden extender los flujos de tráfico congestionado por todas las rutas disponibles. Además, como hemos demostrado en estudios anteriores, la dispersi´on de los flujos congestionados puede deteriorar el rendimiento de los esquemas de colas estáticos utilizados para reducir el HoL blocking mediante la separación de los flujos en diferentes colas del switch buffer. De hecho, como estos sistemas se basan en un criterio estático, definido antes de la inyección del tráfico en la red, no pueden evitar que los flujos congestionados y no congestionados compartan colas cuando se combinan con un encaminamiento adaptativo. En este trabajo, proponemos utilizar algunos esquemas de colas estáticos existentes junto a la asignación dinámica de canales virtuales (VC) para aislar en una solo VC los flujos cuyas rutas han sido encaminadas de forma adaptativa, con el fin de evitar que el impacto de la congestión se extienda a través de varias rutas. Básicamente, los flujos adaptados se mueven a un canal especial de flujos adaptados (AFC), de modo que no interactúan con los flujos asignados a otros VC por el esquema de colas estático. De esta manera, se evita el HoL blocking que los flujos adaptados podrían causar a los flujos no adaptados, incluso si los flujos congestionados se han extendido a través de varias rutas. Por otro lado, el esquema de colas estático reducirá sin ninguna interferencia el HoL blocking que puede aparecer entre los flujos no adaptados. Para evaluar nuestra propuesta hemos realizado experimentos de simulación modelando grandes redes de interconexión basadas en la topología Fat-tree. De los resultados obtenidos, podemos concluir que nuestra técnica reduce de manera eficiente y significativa el impacto del HoLblocking en las redes de interconexión utilizando encaminamiento adaptativo y esquemas de colas cuando aparece la congestión

    An effective and feasible congestion management technique for high-performance MINs with tag-based distributed routing

    Full text link
    As parallel computing systems increase in size, the interconnection network is becoming a critical subsystem. The current trend in network design is to use as few components as possible to interconnect the end nodes, thereby reducing cost and power consumption. However, this increases the probability of congestion appearing in the network. As congestion may severely degrade network performance, the use of a congestion management mechanism is becoming mandatory in modern interconnects. One of the most cost-effective proposals to deal with the problems derived from congestion situations is the Regional Explicit Congestion Notification (RECN) strategy, based on using special queues to totally isolate the packet flows which contribute to congestion, thereby preventing the Head-of-Line (HoL) blocking effect that these flows may cause to others. Unfortunately, RECN requires the use of source-based routing, thus not being suitable for interconnects with distributed routing, like InfiniBand. Although some RECN-like mechanisms have been proposed for distributed-routing networks, they are not scalable due to the huge amount of control memory that they require in medium-size or large networks. In this paper, we propose Distributed-Routing-Based Congestion Management (DRBCM), a new scalable technique which, following the RECN principles, totally prevents congestion from producing HoL-blocking in multistage interconnection networks (MINs) using tag-based distributed routing. Simulation results indicate that, regardless of network size, DRBCM presents small resource requirements to keep network performance at maximum level even in scenarios of heavy congestion, where it utterly outperforms (with a gain up to 70 percent) current solutions for distributed-routing networks, like the InfiniBand congestion-control mechanism based on injection throttling. Thus, DRBCM is an efficient, cost-effective, and scalable solution for congestion management.This work was jointly supported by the MEC, MICINN (currently MINECO), and European Commission under the projects Consolider Ingenio-2010-CSD2006-00046 and TIN2009-14475-C04, and by the JCCM under projects PCC08-0078 (PhD. grant A08/048) and POII10-0289-3724.Escudero-Sahuquillo, J.; Garcia Garcia, P.; Quiles Flor, FJ.; Flich Cardo, J.; Duato Marín, JF. (2013). An effective and feasible congestion management technique for high-performance MINs with tag-based distributed routing. IEEE Transactions on Parallel and Distributed Systems. 24(10):1918-1929. https://doi.org/10.1109/TPDS.2012.303S19181929241

    Adaptive Routing in InfiniBand Hardware

    No full text
    Interconnection networks are the communication backbone of modern high-performance computing systems and an optimised interconnection network is crucial for the performance and utilisation of the system as a whole. One element of the interconnection network is the routing algorithm, which directly influences how we are able to utilise the physical network topology. InfiniBand is one of the most common network architectures used in high-performance computing and traditionally it only supported static routing. For multi-path networks such as Fat-trees, static routing is inefficient because it cannot balance traffic in real-time nor utilise multiple paths efficiently under adversarial traffic. This again potentially leads to unnecessary contention and an underutilised network, which has led to numerous proposals on how to avoid this by using adaptive routing. Adaptive routing has recently been introduced in InfiniBand and in this paper we evaluate to what extent the expected benefits of adaptive routing is true for InfiniBand. Through a set of experiments on HDR InfiniBand equipment we describe the basic behaviour of adaptive routing in InfiniBand, its benefits in Fat tree topologies and the unfortunate side effects related to unfairness that adaptive routing in general might introduce, including such phenomena as the reverse parking lot problem and congestion spreading

    Efficient and Cost-Effective Hybrid Congestion Control for HPC Interconnection Networks

    Full text link
    Interconnection networks are key components in high-performance computing (HPC) systems, their performance having a strong influence on the overall system one. However, at high load, congestion and its negative effects (e.g., Head-of-line blocking) threaten the performance of the network, and so the one of the entire system. Congestion control (CC) is crucial to ensure an efficient utilization of the interconnection network during congestion situations. As one major trend is to reduce the effective wiring in interconnection networks to reduce cost and power consumption, the network will operate very close to its capacity. Thus, congestion control becomes essential. Existing CC techniques can be divided into two general approaches. One is to throttle traffic injection at the sources that contribute to congestion, and the other is to isolate the congested traffic in specially designated resources. However, both approaches have different, but non-overlapping weaknesses: injection throttling techniques have a slow reaction against congestion, while isolating traffic in special resources may lead the system to run out of those resources. In this paper we propose EcoCC, a new Efficient and Cost-Effective CC technique, that combines injection throttling and congested-flow isolation to minimize their respective drawbacks and maximize overall system performance. This new strategy is suitable for current commercial switch architectures, where it could be implemented without requiring significant complexity. Experimental results, using simulations under synthetic and real trace-based traffic patterns, show that this technique improves by up to 55 percent over some of the most successful congestion control techniques.This work has been jointly supported by the MINECO and European Commission (FEDER funds) under the project TIN2012-38341-C04-04, by the JCCM under the project POII10-0289-3724, by Simula Research Labs. under the Research & Development Agreement UCTR120207 with UCLM, and by the HiPEAC consortia by means of a collaboration grant.Escudero Sahuquillo, J.; Gran, EG.; Garcia Garcia, P.; Flich Cardo, J.; Skeie, T.; Lysne, O.; Quiles Flor, FJ.... (2015). Efficient and Cost-Effective Hybrid Congestion Control for HPC Interconnection Networks. IEEE Transactions on Parallel and Distributed Systems. 26(1):107-119. https://doi.org/10.1109/TPDS.2014.2307851S10711926

    Experiencias en la organización de un taller práctico usando Raspberry Pi para fomentar el interés entre los alumnos en la materia de Ingeniería de Computadores

    Get PDF
    El grado en Ingeniería Informática de nuestra Escuela oferta cuatro perfiles de intensificación, entre los que se incluye Ingeniería de Computadores (IC). En el año 2015 el número de alumnos matriculados en las asignaturas de esta tecnología rondaba los 5 estudiantes, muy lejos de los 25 que establece la dirección de la Escuela. Con el objetivo de despertar el interés en los alumnos por las asignaturas de la tecnología de IC, algunos profesores que imparten docencia en estas y otras asignaturas relacionadas propusieron la organización de un taller práctico fuera de las horas lectivas. El objetivo principal del taller es la introducción a los contenidos de la materia de ingeniería de computadores utilizando como herramienta el ordenador mono-placa Raspberry Pi, que ha despertado un gran interés en la docencia en la última década, gracias a su bajo coste y facilidad de adquisición. Como se refleja en este artículo, después de cinco ediciones (y con la sexta en marcha en el momento del envío de este artículo), el número de alumnos matriculados en las asignaturas de la tecnología de IC ha aumentado significativamente, y el taller propuesto ha contribuido objetivamente a ello.The Computer Science degree in our Faculty offers four specialization profiles, including Computer Engineering (CE). In 2015, the number of students enrolled in subjects related to CE was around 5, far from the maximum of 25 established in the degree normative. To awaken students’ interest in CE subjects, some professors proposed the organization of a practical workshop, outside of lecture hours. The proposed workshop focuses on using single board computers (SBC), specifically on the Raspberry Pi ecosystem, which has aroused great interest in teaching in the last decade, thanks to its low cost and ease of acquisition. This article reflects that, after five editions and with the 6th edition underway at the time of sending this article, the number of students enrolled in the CE specialization subjects has increased significantly, and the proposed workshop has objectively contributed to this
    corecore